39 research outputs found
Blind Quality Assessment for in-the-Wild Images via Hierarchical Feature Fusion and Iterative Mixed Database Training
Image quality assessment (IQA) is very important for both end-users and
service-providers since a high-quality image can significantly improve the
user's quality of experience (QoE) and also benefit lots of computer vision
algorithms. Most existing blind image quality assessment (BIQA) models were
developed for synthetically distorted images, however, they perform poorly on
in-the-wild images, which are widely existed in various practical applications.
In this paper, we propose a novel BIQA model for in-the-wild images by
addressing two critical problems in this field: how to learn better
quality-aware feature representation, and how to solve the problem of
insufficient training samples in terms of their content and distortion
diversity. Considering that perceptual visual quality is affected by both
low-level visual features (e.g. distortions) and high-level semantic
information (e.g. content), we first propose a staircase structure to
hierarchically integrate the features from intermediate layers into the final
feature representation, which enables the model to make full use of visual
information from low-level to high-level. Then an iterative mixed database
training (IMDT) strategy is proposed to train the BIQA model on multiple
databases simultaneously, so the model can benefit from the increase in both
training samples and image content and distortion diversity and can learn a
more general feature representation. Experimental results show that the
proposed model outperforms other state-of-the-art BIQA models on six
in-the-wild IQA databases by a large margin. Moreover, the proposed model shows
an excellent performance in the cross-database evaluation experiments, which
further demonstrates that the learned feature representation is robust to
images with diverse distortions and content. The code will be released publicly
for reproducible research
How is Gaze Influenced by Image Transformations? Dataset and Model
Data size is the bottleneck for developing deep saliency models, because
collecting eye-movement data is very time consuming and expensive. Most of
current studies on human attention and saliency modeling have used high quality
stereotype stimuli. In real world, however, captured images undergo various
types of transformations. Can we use these transformations to augment existing
saliency datasets? Here, we first create a novel saliency dataset including
fixations of 10 observers over 1900 images degraded by 19 types of
transformations. Second, by analyzing eye movements, we find that observers
look at different locations over transformed versus original images. Third, we
utilize the new data over transformed images, called data augmentation
transformation (DAT), to train deep saliency models. We find that label
preserving DATs with negligible impact on human gaze boost saliency prediction,
whereas some other DATs that severely impact human gaze degrade the
performance. These label preserving valid augmentation transformations provide
a solution to enlarge existing saliency datasets. Finally, we introduce a novel
saliency model based on generative adversarial network (dubbed GazeGAN). A
modified UNet is proposed as the generator of the GazeGAN, which combines
classic skip connections with a novel center-surround connection (CSC), in
order to leverage multi level features. We also propose a histogram loss based
on Alternative Chi Square Distance (ACS HistLoss) to refine the saliency map in
terms of luminance distribution. Extensive experiments and comparisons over 3
datasets indicate that GazeGAN achieves the best performance in terms of
popular saliency evaluation metrics, and is more robust to various
perturbations. Our code and data are available at:
https://github.com/CZHQuality/Sal-CFS-GAN
Geometry-Aware Video Quality Assessment for Dynamic Digital Human
Dynamic Digital Humans (DDHs) are 3D digital models that are animated using
predefined motions and are inevitably bothered by noise/shift during the
generation process and compression distortion during the transmission process,
which needs to be perceptually evaluated. Usually, DDHs are displayed as 2D
rendered animation videos and it is natural to adapt video quality assessment
(VQA) methods to DDH quality assessment (DDH-QA) tasks. However, the VQA
methods are highly dependent on viewpoints and less sensitive to geometry-based
distortions. Therefore, in this paper, we propose a novel no-reference (NR)
geometry-aware video quality assessment method for DDH-QA challenge. Geometry
characteristics are described by the statistical parameters estimated from the
DDHs' geometry attribute distributions. Spatial and temporal features are
acquired from the rendered videos. Finally, all kinds of features are
integrated and regressed into quality values. Experimental results show that
the proposed method achieves state-of-the-art performance on the DDH-QA
database
Simple Baselines for Projection-based Full-reference and No-reference Point Cloud Quality Assessment
Point clouds are widely used in 3D content representation and have various
applications in multimedia. However, compression and simplification processes
inevitably result in the loss of quality-aware information under storage and
bandwidth constraints. Therefore, there is an increasing need for effective
methods to quantify the degree of distortion in point clouds. In this paper, we
propose simple baselines for projection-based point cloud quality assessment
(PCQA) to tackle this challenge. We use multi-projections obtained via a common
cube-like projection process from the point clouds for both full-reference (FR)
and no-reference (NR) PCQA tasks. Quality-aware features are extracted with
popular vision backbones. The FR quality representation is computed as the
similarity between the feature maps of reference and distorted projections
while the NR quality representation is obtained by simply squeezing the feature
maps of distorted projections with average pooling The corresponding quality
representations are regressed into visual quality scores by fully-connected
layers. Taking part in the ICIP 2023 PCVQA Challenge, we succeeded in achieving
the top spot in four out of the five competition tracks
Perceptual Quality Assessment for Digital Human Heads
Digital humans are attracting more and more research interest during the last
decade, the generation, representation, rendering, and animation of which have
been put into large amounts of effort. However, the quality assessment of
digital humans has fallen behind. Therefore, to tackle the challenge of digital
human quality assessment issues, we propose the first large-scale quality
assessment database for three-dimensional (3D) scanned digital human heads
(DHHs). The constructed database consists of 55 reference DHHs and 1,540
distorted DHHs along with the subjective perceptual ratings. Then, a simple yet
effective full-reference (FR) projection-based method is proposed to evaluate
the visual quality of DHHs. The pretrained Swin Transformer tiny is employed
for hierarchical feature extraction and the multi-head attention module is
utilized for feature fusion. The experimental results reveal that the proposed
method exhibits state-of-the-art performance among the mainstream FR metrics,
which can provide an effective FR-IQA index for DHHs
A No-Reference Quality Assessment Method for Digital Human Head
In recent years, digital humans have been widely applied in augmented/virtual
reality (A/VR), where viewers are allowed to freely observe and interact with
the volumetric content. However, the digital humans may be degraded with
various distortions during the procedure of generation and transmission.
Moreover, little effort has been put into the perceptual quality assessment of
digital humans. Therefore, it is urgent to carry out objective quality
assessment methods to tackle the challenge of digital human quality assessment
(DHQA). In this paper, we develop a novel no-reference (NR) method based on
Transformer to deal with DHQA in a multi-task manner. Specifically, the front
2D projections of the digital humans are rendered as inputs and the vision
transformer (ViT) is employed for the feature extraction. Then we design a
multi-task module to jointly classify the distortion types and predict the
perceptual quality levels of digital humans. The experimental results show that
the proposed method well correlates with the subjective ratings and outperforms
the state-of-the-art quality assessment methods
Saliency in Augmented Reality
With the rapid development of multimedia technology, Augmented Reality (AR)
has become a promising next-generation mobile platform. The primary theory
underlying AR is human visual confusion, which allows users to perceive the
real-world scenes and augmented contents (virtual-world scenes) simultaneously
by superimposing them together. To achieve good Quality of Experience (QoE), it
is important to understand the interaction between two scenarios, and
harmoniously display AR contents. However, studies on how this superimposition
will influence the human visual attention are lacking. Therefore, in this
paper, we mainly analyze the interaction effect between background (BG) scenes
and AR contents, and study the saliency prediction problem in AR. Specifically,
we first construct a Saliency in AR Dataset (SARD), which contains 450 BG
images, 450 AR images, as well as 1350 superimposed images generated by
superimposing BG and AR images in pair with three mixing levels. A large-scale
eye-tracking experiment among 60 subjects is conducted to collect eye movement
data. To better predict the saliency in AR, we propose a vector quantized
saliency prediction method and generalize it for AR saliency prediction. For
comparison, three benchmark methods are proposed and evaluated together with
our proposed method on our SARD. Experimental results demonstrate the
superiority of our proposed method on both of the common saliency prediction
problem and the AR saliency prediction problem over benchmark methods. Our data
collection methodology, dataset, benchmark methods, and proposed saliency
models will be publicly available to facilitate future research